80 research outputs found

    Identification of new genes in Sinorhizobium meliloti using the Genome Sequencer FLX system

    Get PDF
    <p>Abstract</p> <p>Background</p> <p><it>Sinorhizobium meliloti </it>is an agriculturally important model symbiont. There is an ongoing need to update and improve its genome annotation. In this study, we used a high-throughput pyrosequencing approach to sequence the transcriptome of <it>S. meliloti</it>, and search for new bacterial genes missed in the previous genome annotation. This is the first report of sequencing a bacterial transcriptome using the pyrosequencing technology.</p> <p>Results</p> <p>Our pilot sequencing run generated 19,005 reads with an average length of 136 nucleotides per read. From these data, we identified 20 new genes. These new gene transcripts were confirmed by RT-PCR and their possible functions were analyzed.</p> <p>Conclusion</p> <p>Our results indicate that high-throughput sequence analysis of bacterial transcriptomes is feasible and next-generation sequencing technologies will greatly facilitate the discovery of new genes and improve genome annotation.</p

    Hepatoprotective mechanism of Silybum marianum on nonalcoholic fatty liver disease based on network pharmacology and experimental verification

    Get PDF
    The study aimed to identify the key active components in Silybum marianum (S. marianum) and determine how they protect against nonalcoholic fatty liver disease (NAFLD). TCMSP, DisGeNET, UniProt databases, and Venny 2.1 software were used to identify 11 primary active components, 92 candidate gene targets, and 30 core hepatoprotective gene targets in this investigation, respectively. The PPI network was built using a string database and Cytoscape 3.7.2. The KEGG pathway and GO biological process enrichment, biological annotation, as well as the identified hepatoprotective core gene targets were analyzed using the Metascape database. The effect of silymarin on NAFLD was determined using H&E on pathological alterations in liver tissues. The levels of liver function were assessed using biochemical tests. Western blot experiments were used to observe the proteins that were expressed in the associated signaling pathways on the hepatoprotective effect, which the previous network pharmacology predicted. According to the KEGG enrichment study, there are 35 hepatoprotective signaling pathways. GO enrichment analysis revealed that 61 biological processes related to the hepatoprotective effect of S. marianum were identified, which mainly involved in response to regulation of biological process and immune system process. Silymarin was the major ingredient derived from S. marianum, which exhibited the hepatoprotective effect by reducing the levels of ALT, AST, TC, TG, HDL-C, LDL-C, decreasing protein expressions of IL-6, MAPK1, Caspase 3, p53, VEGFA, increasing protein expression of AKT1. The present study provided new sights and a possible explanation for the molecular mechanisms of S. marianum against NAFLD

    Named Entity Recognition for Bacterial Type IV Secretion Systems

    Get PDF
    Research on specialized biological systems is often hampered by a lack of consistent terminology, especially across species. In bacterial Type IV secretion systems genes within one set of orthologs may have over a dozen different names. Classifying research publications based on biological processes, cellular components, molecular functions, and microorganism species should improve the precision and recall of literature searches allowing researchers to keep up with the exponentially growing literature, through resources such as the Pathosystems Resource Integration Center (PATRIC, patricbrc.org). We developed named entity recognition (NER) tools for four entities related to Type IV secretion systems: 1) bacteria names, 2) biological processes, 3) molecular functions, and 4) cellular components. These four entities are important to pathogenesis and virulence research but have received less attention than other entities, e.g., genes and proteins. Based on an annotated corpus, large domain terminological resources, and machine learning techniques, we developed recognizers for these entities. High accuracy rates (>80%) are achieved for bacteria, biological processes, and molecular function. Contrastive experiments highlighted the effectiveness of alternate recognition strategies; results of term extraction on contrasting document sets demonstrated the utility of these classes for identifying T4SS-related documents

    DNA binding mechanism revealed by high resolution crystal structure of Arabidopsis thaliana WRKY1 protein

    Get PDF
    WRKY proteins, defined by the conserved WRKYGQK sequence, are comprised of a large superfamily of transcription factors identified specifically from the plant kingdom. This superfamily plays important roles in plant disease resistance, abiotic stress, senescence as well as in some developmental processes. In this study, the Arabidopsis WRKY1 was shown to be involved in the salicylic acid signaling pathway and partially dependent on NPR1; a C-terminal domain of WRKY1, AtWRKY1-C, was constructed for structural studies. Previous investigations showed that DNA binding of the WRKY proteins was localized at the WRKY domains and these domains may define novel zinc-binding motifs. The crystal structure of the AtWRKY1-C determined at 1.6ā€‰Ć… resolution has revealed that this domain is composed of a globular structure with five Ī² strands, forming an antiparallel Ī²-sheet. A novel zinc-binding site is situated at one end of the Ī²-sheet, between strands Ī²4 and Ī²5. Based on this high-resolution crystal structure and site-directed mutagenesis, we have defined and confirmed that the DNA-binding residues of AtWRKY1-C are located at Ī²2 and Ī²3 strands. These results provided us with structural information to understand the mechanism of transcriptional control and signal transduction events of the WRKY proteins

    Analysis of tall fescue ESTs representing different abiotic stresses, tissue types and developmental stages

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>Tall fescue (<it>Festuca arundinacea </it>Schreb) is a major cool season forage and turf grass species grown in the temperate regions of the world. In this paper we report the generation of a tall fescue expressed sequence tag (EST) database developed from nine cDNA libraries representing tissues from different plant organs, developmental stages, and abiotic stress factors. The results of inter-library and library-specific <it>in silico </it>expression analyses of these ESTs are also reported.</p> <p>Results</p> <p>A total of 41,516 ESTs were generated from nine cDNA libraries of tall fescue representing tissues from different plant organs, developmental stages, and abiotic stress conditions. The <it>Festuca </it>Gene Index (FaGI) has been established. To date, this represents the first publicly available tall fescue EST database. <it>In silico </it>gene expression studies using these ESTs were performed to understand stress responses in tall fescue. A large number of ESTs of known stress response gene were identified from stressed tissue libraries. These ESTs represent gene homologues of heat-shock and oxidative stress proteins, and various transcription factor protein families. Highly expressed ESTs representing genes of unknown functions were also identified in the stressed tissue libraries.</p> <p>Conclusion</p> <p>FaGI provides a useful resource for genomics studies of tall fescue and other closely related forage and turf grass species. Comparative genomic analyses between tall fescue and other grass species, including ryegrasses (<it>Lolium </it>sp.), meadow fescue (<it>F. pratensis</it>) and tetraploid fescue (<it>F. arundinacea var glaucescens</it>) will benefit from this database. These ESTs are an excellent resource for the development of simple sequence repeat (SSR) and single nucleotide polymorphism (SNP) PCR-based molecular markers.</p

    PATRIC, the bacterial bioinformatics database and analysis resource

    Get PDF
    The Pathosystems Resource Integration Center (PATRIC) is the all-bacterial Bioinformatics Resource Center (BRC) (http://www.patricbrc.org). A joint effort by two of the original National Institute of Allergy and Infectious Diseases-funded BRCs, PATRIC provides researchers with an online resource that stores and integrates a variety of data types [e.g. genomics, transcriptomics, protein-protein interactions (PPIs), three-dimensional protein structures and sequence typing data] and associated metadata. Datatypes are summarized for individual genomes and across taxonomic levels. All genomes in PATRIC, currently more than 10 000, are consistently annotated using RAST, the Rapid Annotations using Subsystems Technology. Summaries of different data types are also provided for individual genes, where comparisons of different annotations are available, and also include available transcriptomic data. PATRIC provides a variety of ways for researchers to find data of interest and a private workspace where they can store both genomic and gene associations, and their own private data. Both private and public data can be analyzed together using a suite of tools to perform comparative genomic or transcriptomic analysis. PATRIC also includes integrated information related to disease and PPIs. All the data and integrated analysis and visualization tools are freely available. This manuscript describes updates to the PATRIC since its initial report in the 2007 NAR Database Issu

    Circulating tumor DNA clearance predicts prognosis across treatment regimen in a large real-world longitudinally monitored advanced non-small cell lung cancer cohort

    Get PDF
    Background: Although growth advantage of certain clones would ultimately translate into a clinically visible disease progression, radiological imaging does not reflect clonal evolution at molecular level. Circulating tumor DNA (ctDNA), validated as a tool for mutation detection in lung cancer, could reflect dynamic molecular changes. We evaluated the utility of ctDNA as a predictive and a prognostic marker in disease monitoring of advanced non-small cell lung cancer (NSCLC) patients.Methods: This is a multicenter prospective cohort study. We performed capture-based ultra-deep sequencing on longitudinal plasma samples utilizing a panel consisting of 168 NSCLC-related genes on 949 advanced NSCLC patients with driver mutations to monitor treatment responses and disease progression. The correlations between ctDNA and progression-free survival (PFS)/overall survival (OS) were performed on 248 patients undergoing various treatments with the minimum of 2 ctDNA tests.Results: The results of this study revealed that higher ctDNA abundance (P=0.012) and mutation count (P=8.5x10(-4)) at baseline are associated with shorter OS. We also found that patients with ctDNA clearance, not just driver mutation clearance, at any point during the course of treatment were associated with longer PFS (P=2.2x10(-1)6, HR 0.28) and OS (P=4.5x10(-6), HR 0.19) regardless of type of treatment and evaluation schedule.Conclusions: This prospective real-world study shows that ctDNA clearance during treatment may serve as predictive and prognostic marker across a wide spectrum of treatment regimens

    Sequencing of Culex quinquefasciatus establishes a platform for mosquito comparative genomics

    Get PDF
    Culex quinquefasciatus (the southern house mosquito) is an important mosquito vector of viruses such as West Nile virus and St. Louis encephalitis virus, as well as of nematodes that cause lymphatic filariasis. C. quinquefasciatus is one species within the Culex pipiens species complex and can be found throughout tropical and temperate climates of the world. The ability of C. quinquefasciatus to take blood meals from birds, livestock, and humans contributes to its ability to vector pathogens between species. Here, we describe the genomic sequence of C. quinquefasciatus: Its repertoire of 18,883 protein-coding genes is 22% larger than that of Aedes aegypti and 52% larger than that of Anopheles gambiae with multiple gene-family expansions, including olfactory and gustatory receptors, salivary gland genes, and genes associated with xenobiotic detoxification

    Overview of the ID, EPI and REL tasks of BioNLP Shared Task 2011

    Get PDF
    We present the preparation, resources, results and analysis of three tasks of the BioNLP Shared Task 2011: the main tasks on Infectious Diseases (ID) and Epigenetics and Post-translational Modifications (EPI), and the supporting task on Entity Relations (REL). The two main tasks represent extensions of the event extraction model introduced in the BioNLP Shared Task 2009 (ST'09) to two new areas of biomedical scientific literature, each motivated by the needs of specific biocuration tasks. The ID task concerns the molecular mechanisms of infection, virulence and resistance, focusing in particular on the functions of a class of signaling systems that are ubiquitous in bacteria. The EPI task is dedicated to the extraction of statements regarding chemical modifications of DNA and proteins, with particular emphasis on changes relating to the epigenetic control of gene expression. By contrast to these two application-oriented main tasks, the REL task seeks to support extraction in general by separating challenges relating to part-of relations into a subproblem that can be addressed by independent systems. Seven groups participated in each of the two main tasks and four groups in the supporting task. The participating systems indicated advances in the capability of event extraction methods and demonstrated generalization in many aspects: from abstracts to full texts, from previously considered subdomains to new ones, and from the ST'09 extraction targets to other entities and events. The highest performance achieved in the supporting task REL, 58% F-score, is broadly comparable with levels reported for other relation extraction tasks. For the ID task, the highest-performing system achieved 56% F-score, comparable to the state-of-the-art performance at the established ST'09 task. In the EPI task, the best result was 53% F-score for the full set of extraction targets and 69% F-score for a reduced set of core extraction targets, approaching a level of performance sufficient for user-facing applications. In this study, we extend on previously reported results and perform further analyses of the outputs of the participating systems. We place specific emphasis on aspects of system performance relating to real-world applicability, considering alternate evaluation metrics and performing additional manual analysis of system outputs. We further demonstrate that the strengths of extraction systems can be combined to improve on the performance achieved by any system in isolation. The manually annotated corpora, supporting resources, and evaluation tools for all tasks are available from http://www.bionlp-st.org and the tasks continue as open challenges for all interested parties
    • ā€¦
    corecore